138 research outputs found

    Analyzing the Amazon Mechanical Turk Marketplace

    Get PDF
    Since the concept of crowdsourcing is relatively new, many potential participants have questions about the AMT marketplace. For example, a common set of questions that pop up in an 'introduction to crowdsourcing and AMT' session are the following: What type of tasks can be completed in the marketplace? How much does it cost? How fast can I get results back? How big is the AMT marketplace? The answers for these questions remain largely anecdotal and based on personal observations and experiences. To understand better what types of tasks are being completed today using crowdsourcing techniques, we started collecting data about the AMT marketplace. We present a preliminary analysis of the dataset and provide directions for interesting future research

    A Framework for Quality Assurance in Crowdsourcing

    Get PDF
    The emergence of online paid micro-crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), allows on-demand and at scale distribution of tasks to human workers around the world. In such settings, online workers come and complete small tasks posted by a company, working for as long or as little as they wish. Such temporary employer-employee relationships give rise to adverse selection, moral hazard, and many other challenges. How can we ensure that the submitted work is accurate, especially when the verification cost is comparable to the cost of performing the task? How can we estimate the exhibited quality of the workers? What pricing strategies should be used to induce the effort of workers with varying ability levels? We develop a comprehensive framework for managing the quality in such micro crowdsourcing settings: First, we describe an algorithm for estimating the error rates of the participating workers, and show how to separate systematic worker biases from unrecoverable errors and generate an unbiased “worker quality” measurement. Next, we present a selective repeated-labeling algorithm that acquires labels in a way so that quality requirements can be met at minimum cost. Then, we propose a quality-adjusted pricing scheme that adjusts the payment level according to the contributed value by each worker. We test our compensation scheme in a principal-agent setting in which workers respond to incentives by varying their effort. Our simulation results demonstrate that the proposed pricing scheme is able to induce workers to exert higher levels of effort and yield larger profits for employers compared to the commonly adopted uniform pricing schemes. We also describe strategies that build on our quality control and pricing framework, to tackle crowdsourced tasks of increasingly higher complexity, while still maintaining a tight quality control of the process

    Classification-Aware Hidden-Web Text Database Selection,

    Get PDF
    Many valuable text databases on the web have noncrawlable contents that are “hidden” behind search interfaces. Metasearchers are helpful tools for searching over multiple such “hidden-web” text databases at once through a unified query interface. An important step in the metasearching process is database selection, or determining which databases are the most relevant for a given user query. The state-of-the-art database selection techniques rely on statistical summaries of the database contents, generally including the database vocabulary and associated word frequencies. Unfortunately, hidden-web text databases typically do not export such summaries, so previous research has developed algorithms for constructing approximate content summaries from document samples extracted from the databases via querying.We present a novel “focused-probing” sampling algorithm that detects the topics covered in a database and adaptively extracts documents that are representative of the topic coverage of the database. Our algorithm is the first to construct content summaries that include the frequencies of the words in the database. Unfortunately, Zipf’s law practically guarantees that for any relatively large database, content summaries built from moderately sized document samples will fail to cover many low-frequency words; in turn, incomplete content summaries might negatively affect the database selection process, especially for short queries with infrequent words. To enhance the sparse document samples and improve the database selection decisions, we exploit the fact that topically similar databases tend to have similar vocabularies, so samples extracted from databases with a similar topical focus can complement each other. We have developed two database selection algorithms that exploit this observation. The first algorithm proceeds hierarchically and selects the best categories for a query, and then sends the query to the appropriate databases in the chosen categories. The second algorithm uses “shrinkage,” a statistical technique for improving parameter estimation in the face of sparse data, to enhance the database content summaries with category-specific words.We describe how to modify existing database selection algorithms to adaptively decide (at runtime) whether shrinkage is beneficial for a query. A thorough evaluation over a variety of databases, including 315 real web databases as well as TREC data, suggests that the proposed sampling methods generate high-quality content summaries and that the database selection algorithms produce significantly more relevant database selection decisions and overall search results than existing algorithms.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Modeling Dependency in Prediction Markets

    Get PDF
    In the last decade, prediction markets became popular forecasting tools in areas ranging from election results to movie revenues and Oscar nominations. One of the features that make prediction markets particularly attractive for decision support applications is that they can be used to answer what-if questions and estimate probabilities of complex events. Traditional approach to answering such questions involves running a combinatorial prediction market, what is not always possible. In this paper, we present an alternative, statistical approach to pricing complex claims, which is based on analyzing co-movements of prediction market prices for basis events. Experimental evaluation of our technique on a collection of 51 InTrade contracts representing the Democratic Party Nominee winning Electoral College Votes of a particular state shows that the approach outperforms traditional forecasting methods such as price and return regressions and can be used to extract meaningful business intelligence from raw price data

    Modeling Volatility in Prediction Markets

    Get PDF
    Nowadays, there is a significant experimental evidence of excellent ex-post predictive accuracy in certain types of prediction markets, such as markets for elections. This evidence shows that prediction markets are efficient mechanisms for aggregating information and are more accurate in forecasting events than traditional forecasting methods, such as polls. Interpretation of prediction market prices as probabilities has been extensively studied in the literature, however little attention so far has been given to understanding volatility of prediction market prices. In this paper, we present a model of a prediction market with a binary payoff on a competitive event involving two parties. In our model, each party has some underlying ``ability'' process that describes its ability to win and evolves as an Ito diffusion. We show that if the prediction market for this event is efficient and accurate, the price of the corresponding contract will also follow a diffusion and its instantaneous volatility is a particular function of the current claim price and its time to expiration. We generalize our results to competitive events involving more than two parties and show that volatilities of prediction market contracts for such events are again functions of the current claim prices and the time to expiration, as well as of several additional parameters (ternary correlations of the underlying Brownian motions). In the experimental section, we validate our model on a set of InTrade prediction markets and show that it is consistent with observed volatilities of contract returns and outperforms the well-known GARCH model in predicting future contract volatility from historical price data. To demonstrate the practical value of our model, we apply it to pricing options on prediction market contracts, such as those recently introduced by InTrade. Other potential applications of this model include detection of significant market moves and improving forecast standard errors

    Estimating the Socio-Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

    Get PDF
    With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we re-examine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes like the extent of their perceived usefulness. Our approach explores multiple aspects of review text, such as lexical, grammatical, semantic, and stylistic levels to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences have a negative effect on product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are considered more informative (or helpful) by the users. By using Random Forest based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. Reviews for products that have received widely fluctuating reviews, also have reviews of widely fluctuating helpfulness. In particular, we find that highly detailed and readable reviews can have low helpfulness votes in cases when users tend to vote negatively not because they disapprove of the review quality but rather to convey their disapproval of the review polarity. We examine the relative importance of the three broad feature categories: `reviewer-related' features, `review subjectivity' features, and `review readability' features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates econometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their socio-economic impact. Our results can have implications for judicious design of opinion forums

    The Dimensions of Reputation in Electronic Markets

    Get PDF
    We analyze how di erent dimensions of a seller's reputation a ect pricing power in electronic markets. We do so by using text mining techniques to identify and structure dimensions of importance from feedback posted on reputation systems, by aggregating and scoring these dimensions based on the sentiment they contain, and using them to estimate a series of econometric models associating reputation with price premiums. We nd that di erent dimensions do indeed a ect pricing power di erentially, and that a negative reputation hurts more than a positive one helps on some dimensions but not on others. We provide the rst evidence that sellers of identical products in electronic markets di erentiate themselves based on a distinguishing dimension of strength, and that buyers vary in the relative importance they place on di erent ful lment characteristics. We highlight the importance of textual reputation feedback further by demonstrating it substantially improves the performance of a classi er we have trained to predict future sales. This paper is the rst study that integrates econometric, text mining and predictive modeling techniques toward a more complete analysis of the information captured by reputation systems, and it presents new evidence of the importance of their e ective and judicious design.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Get Another Label? Improving Data Quality and Data Mining

    Get PDF
    This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity and show several main results: (i) Repeated-labeling can improve label and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a robust technique that combines different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Demographics of Mechanical Turk

    Get PDF
    We present the results of a survey that collected information about the demographics of participants on Amazon Mechanical Turk, together with information about their level of activity and motivation for working on Amazon Mechanical Turk. We find that approximately 50% of the workers come from the United States and 40% come from India. Country of origin tends to change the motivating reasons for workers to participate in the marketplace. Significantly more workers from India participate on Mechanical Turk because the online marketplace is a primary source of income, while in the US most workers consider Mechanical Turk a secondary source of income. While money is a primary motivating reason for workers to participate in the marketplace, workers also cite a variety of other motivating reasons, including entertainment and education
    • …
    corecore